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STATISTICAL  SIGNIFICANCE  AND  BASELINE  MONITORING 


Raynond  W.  Alden* 


INTRODUCTION 


Scientists  developing  environmental  monitoring  programs  must  consider 


the  ultimate  question:  Has  a  significant  impact  occurred?  This  question 


represents  a  primary  concern  of  environmentalists  and  regulatory  agencies 


alike.  Therefore,  the  investigator  must  design  baseline  and  trend  assess¬ 


ment  studies  in  such  a  way  as  to  allow  detection  of  environmental  impacts. 


However,  to  properly  address  this  question  the  investigator  should  have  a 


basic  definition  of  a  “^significant""  impact.  Typically,  it  is  approached  as 


two  components:  1)  What  is  a  statistical  impact?  and  2)  What  is  an  eco¬ 


logical  impact?  In  order  for  an  impact  to  be  considered  ecologically  sig¬ 


nificant,  it  really  should  be  statistically  significant.  However,  the  con¬ 


verse  is  not  necessarily  true.  In  fact,  it  would  be  desirable  to  design  a 


monitoring  program  which  would  allow  the  statistical  detection  of  ecological 


changes  before  they  become  critical. 


present  »tt>dy  offers  strategies  for  defining  statistical  impacts 


for  an  environmental  monitoring  program.  Specifically,  a  series  of  sta¬ 


tistical  techniques  have  been  developed  to  estimate  ''minimum  detectable 


Hmn  rt  nr 


impacts"  (MDIs)  for  variables  examined  during  the  baseline  phase  of  a  moni¬ 


toring  program  at  an  open  ocean  dredged  material  disposal  site.  The  MDIs 


are  dependent  upon  natural  spatiotemporal  variability  of  baseline  data  and 


the  intensity  of  the  monitoring  effort.  ( 


*Director,  Applied  Marine  Research  Laboratory,  Old  Dominion  University, 
Norfolk,  Virginia  23508. 
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EVALUATION  AND  DISCUSSION  OF  METHODS 


MDIs  of  Single  Samples 

One  of  the  most  basic  topics  which  may  be  considered  concerns  how  dif¬ 
ferent  a  single  sample  must  be  before  a  statistically  significant  impact  can 
be  inferred.  Green  (1979)  considers  a  number  of  statistical  methods  for 
evaluating  environmental  data.  One  approach  which  Green  suggests  to  detect 
statistically  significant  outliers  involves  the  evaluation  of  the  samples 
against  the  context  of  the  variance-covariance  relationships  of  the  baseline 
data  set.  The  method  involves  the  use  of  a  chi-square  (  x^)  test  of  a 
sample  of  variables  employing  the  following  equation: 


(pdf) 


=  (X,  -  XJ  D-i  (X,  -  X,) 


(1) 


where  X^  is  the  value  of  the  new  observation;  Xj  is  the  mean  or  expected 


.th 


value  of  the  j  variable,  and  0  is  the  variance-covarience  matrix. 

If  the  sample  being  tested  is  sufficiently  divergent  from  the  original 
data  set,  equation  (1)  will  produce  a  value  greater  than  the  critical 
test  level  for  p  degrees  of  freedom.  This  equation  can  also  be  used  to 
predict  the  MDI  levels  for  single  samples. 

A  computer  program  was  developed  to  add  or  subtract  factors  (i.e.  small 
percentages  of  the  means)  incrementally  to  the  means  of  each  variable.  This 
program  was  coupled  with  the  x^  test  (equation  1)  to  evaluate  iteratively 
the  effects  of  increasing  or  decreasing  the  values  of  each  variable.  Each 
variable  was  evaluated  sequentially,  with  all  other  variables  remaining 
constant  and  equal  to  the  means.  The  factors  were  changed  incrementally  for 
the  variable  being  tested  until  a  significant  x^  value  was  attained.  The 


values  of  the  factors  which  just  produced  significant  values  for  each 
variable  were  considered  to  be  the  MOI  levels. 

To  provide  an  example  of  how  this  method  may  be  used,  a  subset  of  data 
from  a  baseline  water  quality  monitoring  program  at  a  potential  open  ocean 
dredged  material  disposal  site  was  subjected  to  the  MOI  evaluation  process. 
Data  from  two  cruises  taken  in  late  summer  (August)  and  early  autumn  (Oc¬ 
tober)  were  selected  from  a  three  year  baseline  program  (1981-1983).  These 
data  would  be  expected  to  exhibit  the  sort  of  natural  spatiotemporal  vari¬ 
ability  which  may  be  observed  for  any  given  seasonal  period.  The  means,  the 
values  of  the  simulated  "impacted"  sample,  and  the  MOI  factors  estimated  for 
each  of  the  water  quality  variables  are  presented  in  Table  I.  Parameters 
which  may  be  expected  to  decrease  when  impacted  by  dredged  material  disposal 
operations  (e.g.  dissolved  oxygen,  pH,  plant  pigments)  had  the  factors  sub¬ 
tracted  from  the  means,  while  those  which  would  be  expected  to  increase  had 
the  factors  added. 

Although  there  was  a  fairly  wide  range  of  MOI  levels,  from  5%  to  over 
400%,  none  of  the  "impacted"  values  were  at  a  level  considered  to  be  ex¬ 
tremely  harmful  ecologically.  The  parameters  with  the  greatest  relative 
MDIs  were  the  ones  which  were  found  at  extremely  low  levels,  often  near 
detection  limits.  Therefore,  the  absolute  concentrations  of  the  "impacted" 
samples,  although  significantly  different  from  the  baseline  means,  were 
still  moderately  low.  In  fact,  few  of  the  "impacted"  values  fell  outside  of 
the  natural  range  reported  by  Kester  and  Courant  (1973)  for  estuarine  Chesa¬ 
peake  Bay  waters,  and  none  approached  the  water  quality  criteria  or  refer¬ 
ence  levels  recommended  by  state  and  federal  agencies  for  the  protection  of 
marine  life,  or  the  prevention  of  eutrophication  (Virginia  State  Water 
Control  Board,  1976).  Thus,  the  approach  allowed  the  detection  of  a 


Table  I.  Results  of  MDI  analysis  for  single  samples.  Means,  standard  errors,  estimated 
values  of  the  "impacted"  sample  and  WDI  levels  are  presented  for  each 
parameter. 


VARIABLE 


UNIT 


MEAN 


STANDARD 

ERRORS 


"IMPACTED" 

VALUE 


M.D.I. 

(%  OF  MEAN) 


Dissolved  Oxygen 

mg/1 

8.78 

0.21 

6.15 

-  30 

pH 

— 

7.85 

0.01 

7,46 

-  5 

C.O.D. 

mg/1 

30.60 

1.49 

7.46 

+105 

Turbidity 

NTU 

1.40 

0.10 

2.03 

+145 

Nitrate-Nitrite 

yg/1 

0.30 

0.07 

1,53 

+410 

Orthophosphate 

mg/l 

0.002 

0.0005 

0.009 

+352 

Total  Phosphorus 

mg/1 

0.009 

0.002 

0.035 

+290 

TKN 

mg/1 

0.133 

0.005 

0.259 

+  95 

Ammonia 

mg/1 

0,092 

0.005 

0.212 

+130 

Suspended  Solids 

mg/1 

12.01 

0.59 

22.22 

+  85 

Volatile  Nonf ilterable  Residue 

mg/l 

3.29 

0.21 

6.91 

+110 

Chlorophyll  a 

mg/1 

4.31 

0.14 

3.88 

-  10 

Chlorophyll  £  (corrected) 

mg/l 

4.25 

0.15 

3.61 

-  15 

Chlorophyll  ^ 

mg/l 

0.51 

0.02 

0.18 

-  65 

Chlorophyll  £ 

mg/l 

2.19 

0.09 

1.31 

-  40 

Phaeophytin 


mg/1 


0.28 


0.06 


0.88 


+215 


statistically  significant  impact  at  a  level  below  that  which  may  be  of  acute 
ecological  significance.  Of  course,  this  is  the  desired  situation  if  the 
trend  assessment  is  to  act  as  an  "early  warning  system"  for  the  detection  of 
an  impact  before  the  environment  deteriorates  excessively. 

It  is  anticipated  that  during  the  major  environmental  perturbation  more 
than  one  variable  is  impacted.  The  amount  of  change  in  a  given  variable 
required  for  statistical  detection  would  then  be  expected  to  be  considerably 
less.  An  empirical  examination  of  the  same  data  base  was  tested  with 
various  impact  levels  being  introduced.  Each  testing  series  consisted  of 
the  evaluation  of  data  sets  for  which  the  values  of  various  combinations  of 
variables  were  changed  by  factors  from  10%  to  100%  of  the  MDI  levels. 

Series  were  tested  for  1,  2,  3,  4,  5,  10  and  16  variables  being  changed  at 
the  same  time.  For  the  series  involving  1,  2,  and  16  variables,  all  possible 
combinations  were  tested  at  each  impact  level.  Since  the  numbers  of  permu¬ 
tations  of  variables  in  the  other  series  were  too  large  to  allow  the  exam¬ 
ination  of  all  combinations,  ten  randomly  selected  combinations  were  tested. 

The  mean  values  for  each  impact  level  tested  for  each  series  are 
presented  in  Figure  1.  The  most  obvious  trend  is  that,  the  more  variables 
being  changed,  the  smaller  the  impact  required  for  any  given  variable  to 
produce  a  statistically  significant  value.  In  fact,  for  virtually  all 
cases  involving  impacts  to  multiple  variables,  changes  much  greater  than  60% 
of  the  MDI  produce  signficant  x^  values.  On  the  other  hand,  tests  involv¬ 
ing  impacts  of  30%  of  the  MDI  or  less  do  not  produce  significant  x^ 
values,  even  when  all  variables  are  involved.  Therefore,  the  MDI  factors 
are  conservative  estimates  of  statistically  significant  impacts  which  appear 
to  be  within  a  range  of  two  to  three  times  the  levels  causing  an  impact  when 
more  than  one  variable  is  affected. 


Figure  1,  Mean  value  versus  the  percent  of  the  MDI's  for  a  series  of 
emperical  tests.  The  data  points  for  series  involving  1,  2  and 
16  variables  being  impacted  at  once  represent  the  means  of  all 
possible  combinations,  while  those  for  the  other  series  (3,  4,  5 
and  10  variables  impacted  at  once)  represent  the  means  of  tests 
of  ten  randomly  selected  combinations  for  each  impact  level. 


MDIs  For  Data  Sets 

The  most  common  circumstances  for  the  examination  of  data  for  impacts 
involves  the  collecti  n  of  multiple  samples  in  a  trend  assessment  study. 
Therefore,  it  is  desirable  to  develop  statistical  models  for  the  evaluation 
of  entire  post-impact  data  set  (e.g.  water  quality  data  from  a  cruise  taken 
after  a  disposal  site  becomes  active). 

Multivariate  statistical  models  are  very  useful  in  the  determination  of 
patterns  in  baseline  data,  as  well  as  in  the  detection  of  impacts  in  trend 
assessment  data  (Alden  et  al.,  1982).  Such  tests  provide  a  single  answer 
concerning  the  data  set  and  avoid  the  multiple  tests  required  by  the  uni¬ 
variate  approach.  Multiple  univariate  tests  in  the  context  of  a  monitoring 
program  mean  that  numerous  false  alarms  (i.e.  Type  I  errors)  crop  up  over 
time  by  chance  alone.  Therefore,  the  models  for  post-impact  data  sets  are 
based  upon  various  multivariate  statistical  approaches. 

Statistical  models  for  trend  assesanent  studies  may  fall  into  several 
major  categories:  baseline,  seasonal,  and  seasonal-area  interaction  con¬ 
trasts.  The  baseline  contrast  models  evaluate  differences  between  the  data 
from  a  post-impact  collection  and  the  entire  baseline  data  set.  The  season¬ 
al  models  compare  the  post-impact  data  with  data  collected  during  the  same 
season  under  baseline  conditions.  The  seasonal  models  are  generally  more 
sensitive  than  the  baseline  models  because  natural  season  to  season  vari¬ 
ations  are  not  included  in  the  error  terms  of  the  seasonal  contrasts. 

The  approach  which  is  potentially  the  most  sensitive  at  detecting 
impacts,  involves  the  seasonal-area  interaction  models.  This  type  of  model 
assimes  that  there  is  an  area  which,  during  the  post-impact  period,  exhibits 
conditions  which  would  be  identical  to  those  in  the  impacted  area,  if  a 
perturbation  had  not  occurred.  Therefore,  seasonal  and  aperiodic  temporal 


7 


variations  are  "filtered  out"  statistically  before  impacts  are  evaluated. 
Green  (1979)  describes  various  aspects  of  these  models  in  detail. 

The  types  of  effects  schematically  tested  in  a  seasonal-area  inter¬ 
action  best  approach  the  study  of  an  open  ocean  disposal  site  (The  Norfolk 
Disposal  Site  -  "NDS")  (Figure  2).  Effect  1  represents  the  comparison 
between  water  quality  at  the  disposal  site  and  that  of  surrounding  waters. 
Effect  2  is  the  difference  between  surface  and  bottom  water  samples.  The 
third  effect  is  the  expected  seasonal  change.  The  fourth  and  fifth  effects 
are  the  responses  of  interest:  those  impacts  caused  by  disposal  operations 
at  the  site.  By  correcting  for  Effects  1-3  prior  to  the  statistical  evalu¬ 
ation  of  impact  effects,  much  of  the  natural  spatiotemporal  variations  can 
be  eliminated  from  the  analysis. 

As  with  the  case  of  the  single  sample,  the  MDI's  for  data  sets  are 
expressed  as  the  factors  which  just  produce  statistically  significant 
results  in  post-impact  data  sets.  The  MOIs  are  proportions  of  the  means  of 
the  variables  which  may  be  positive  (for  enhancement)  or  negative  (for  in¬ 
hibition),  depending  upon  the  expected  responses  to  the  disposal  operations 
Factors  are  added  iteratively  to  the  variables  in  the  unimpacted  data  set 
until  a  statistically  significant  test  criterion  results  for  each. 

Two  groups  are  compared  with  the  baseline  and  seasonal  models:  the 
reference  or  control  groups,  and  the  "impacted"  data  set.  Therefore,  the 
MOIs  for  the  variables  are  based  upon  the  F-value  from  a  single  classifi¬ 
cation  ANOVA  model  for  unequal  sample  sizes.  The  F-value  test  criterion  is 
defined  as: 


Schematic  presentation  of  the  effects  tested  in  seasonal-area 
interaction  models.  The  inner  circles  represent  data  from  the 
disposal  site,  while  the  other  circles  represent  the  "control" 
data  from  surrounding  areas.  The  solid  line  circles  represent 
surface  water  data,  while  the  broken  line  circles  represent 
bottom  water  data.  The  arrows  represent  effects  being  tested. 
(See  text  for  more  detailed  explanation). 
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where  n  is  the  total  number  of  samples,  a  =  2  for  the  two  groups,  and 
MS  is  the  variance  of  the  groups.  SS-  is  the  sum  of  the  squares  for 
groups,  and  is  defined  as: 

SSg  =  (((Xg)2xRg)+({X^)2xR^))-({(XgXRg)+(X^xR^))2  (3) 

where  Xg  and  X^  are  the  mean  values  for  variables  in  the  before  and 
after  data,  respectively,  and  Rg  and  R^  are  the  nimber  of  replicates  in 
the  two  sets.  The  "Xg  is  calculated  for  the  seasonal  or  baseline  data  set, 
while  the  X^  is  the  mean  for  a  data  set  to  which  the  incremented  factors 
have  been  added  iteratively  until  a  significant  F-value  is  produced. 

The  method  for  calculating  the  MO  Is  for  a  season-area  interaction  model 
is  somewhat  more  complicated,  but  is  based  upon  a  similar  approach.  Green 
(1979)  presents  a  method  for  the  determination  of  the  nunber  of  replicates 
required  in  a  sampling  regime  to  statistically  detect  a  given  level  of 
change  in  the  variables  between  a  "control"  and  an  "impacted"  area.  The 
method  for  determining  MDIs  essentially  examines  the  converse  of  this  situ¬ 
ation:  the  level  of  a  statistically  significant  "impact"  is  determined  for 
a  given  sampling  regime  (known  nunber  of  replicates)  and  a  predetermined 
level  of  spatiotemporal  variability.  Using  the  terms  employed  by  Green 
(1979)  and  omitting  several  steps,  the  test  criterion  for  the  interaction 
effect  is: 

^[1,€(R-1)]  "  ^^INT  *  ^^w  “ 


(4) 


where  is  the  interaction  sun  of  squares;  MS^  is  the  error  or  with¬ 

in  group  mean  square  which  is  assumed  to  be  equal  to  S|,  the  variance  of 
the  groups;  Zg^,  Zgj,  Z^^,  and  Z^j  are  the  means  for  the  control  and 
impact  areas  before  and  after  perturbation;  and  Rg^,,  Rgj,  R^^  and  R^^ 
are  the  replicates  for  the  various  groups.  Green  assumed  that  Zg^  =  Zgj  = 
Z^Q,  and  that  Z^j  =  Zgj  -  (impact  factor);  the  variance  term  for  all  groups 
equaled  S|;  and  the  number  of  replicates  R,  the  term  for  which  he  solved 
for,  was  equal  in  all  groups. 

The  mean  value  in  the  MDI  method  for  each  gro^p  as  well  as  a  pooled 
variance  term  is  calculated.  The  actual  number  of  replicates  for  each  group 
is  also  used.  Since  MDIs  are  calculated  prior  to  an  actual  impact,  Z^^  *■ 
Z^j,  so  an  "impact"  is  introduced:  Z^j*  =  Z^j  +  (Z^j  x  impact  factor). 

The  Z^j*  values  are  adjusted  by  incremental  changes  in  the  impact  factor 
until  a  significant  F-value  is  produced  through  interactive  tests  employing 
equation  4.  When  a  significant  interaction  term  is  produced,  the  impact 
factor  becomes  the  MDI  for  the  variable  being  tested.  The  computer  program 
for  evaluating  MDI's  accepts  data  sets  from  the  disposal  site  and  surround¬ 
ing  water  from  two  seasons  and  sequentially  evaluates  the  variables  to  pro¬ 
duce  a  MDI  value  for  each.  Thus,  a  data  set  with  a  great  deal  of  natural 
spatiotemporal  variations  and  relatively  low  level  of  replication  will 
necessitate  a  greater  level  of  change  before  statistical  significance  is 
achieved  (i.e.  a  set  of  larger  MDI  values  is  produced). 

■^e  MDI  values  for  the  baseline,  seasonal  and  seasonal-area  interaction 
models  .'or  data  from  two  (August  and  October)  water  quality  cruises  taken  as 
part  of  a  baseline  monitoring  cruise  to  the  Norfolk  Disposal  Site  are  pre¬ 
sented  in  Table  II.  In  general,  the  MDI  values  for  the  variables  under  the 
baseline  model  are  greater  than  those  for  the  seasonal  model,  which,  in 


Table  II.  Results  of  MDI  analysis  for  baseline,  seasonal  and  seasonal-area 
interaction  models. 


PREDICTED  MDI'S  {%  OF  MEANS) 


PARAMETER 

BASELINE 

SEASONAL 

SEASON-AREA 

INTERACTION 

Dissolved  oxygen 

-30% 

-30% 

-15% 

pH 

-5% 

-5% 

-5% 

Chemical  Oxygen  Demand 

25% 

25% 

25% 

Turbidity 

120% 

120% 

120% 

Nitrate-Nitrite 

325% 

210% 

★ 

Orthophosphate 

800% 

400% 

200% 

Total  Phosphorous 

<3000% 

<3000% 

200% 

Total  Kjeldahl  Nitrogen 

230% 

10% 

5% 

Ammonia 

60% 

45% 

35% 

Suspended  Solids 

15% 

10% 

15% 

Volatile  Nonfilterable  Residue 

15% 

15% 

10% 

Chlorophyll  a^ 

-25% 

-20% 

-25% 

Chlorophyll  a^  (corrected) 

-25% 

-25% 

-25% 

Chlorophyll  ^ 

-45% 

-15% 

-15% 

Chlorophyll  c 

-40% 

-5% 

-5% 

Phaeophytin 

700% 

300% 

111% 

*At  least  one  group  of  values  in  the  model  were  all  below  detection  limits. 
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turn,  are  larger  than  those  for  the  seasonal-area  interaction  model.  The 
pattern  is  especially  apparent  for  the  parameters  which  display  the  greatest 
variability,  and  thus  have  the  greatest  MDI  levels.  This  trend  is  obviously 
due  to  the  fact  that  more  of  the  spatiotemporal  variability  is  accounted  for 
by  the  interaction  model.  However,  the  greater  degrees  of  freedom  in  the 
baseline  model  and,  to  a  lesser  extent,  in  the  seasonal  model  partially 
offset  this  pattern.  The  trend  would  be  greater  if  year-to-year  variations 
are  included  (i.e.  the  evaluation  of  two  years  of  baseline  data,  or  the  data 
from  the  same  seasonal  period  on  different  years).  The  seasonal-area  model 
accounts  for  this  source  of  variation  as  long  as  a  natural  change  does  not 
occur  in  one  of  the  areas  that  is  not  also  taking  place  in  the  other.  Pre¬ 
sumably,  the  amount  of  added  variation  observed  in  a  continuing  baseline 
monitoring  program  would  become  somewhat  asymptotic  over  time,  so  the  MDI 
levels  should  stabilize.  This  trend  would  be  useful  to  analyze  in  order  to 
determine  the  level  of  effort  required  to  "get  a  handle"  on  natural  spatio- 
temporal  variability  prior  to  trend  assessment  studies.  Graphically  or 
statistically  relating  the  MDI  values  to  the  cimulative  sampling  effort 
(e.g.  the  period,  nunber  of  cruises  or  the  number  of  samples  in  the  data 
set)  may  provide  useful  insight  into  the  design  of  effective  baseline  pro¬ 
grams.  Of  course,  long-term  natural  trends  (e.g.  regional  degradation  in 
water  quality  over  decades)  would  tend  to  confound  these  findings. 

Empirical  Tests  of  MDI  Levels  with  Multivariate  Models 

Multivariate  statistical  techniques  have  been  employed  to  characterize 
spatiotemporal  water  quality  patterns  of  an  ocean  disposal  site  under  base¬ 
line  conditions  (Alden  et  al.,  1982).  Similar  methods  can  be  employed  in 
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environmental  monitoring  studies  to  confirm  the  MDI  levels  during  the  base¬ 
line  phase  and,  ultimately,  to  detect  significant  environmental  impacts 
during  the  trend  assessment  phase.  The  techniques  employed  include  princi¬ 
pal  components  analysis  (PCA),  discriminant  analysis,  and  multivariate 
analysis  of  variance  (MANOVA).  For  demonstration  purposes,  the  seasonal  MDI 
levels  presented  in  Table  II  will  be  evaluated  using  each  of  these  tech¬ 
niques. 

Green  (1979)  presents  a  method  for  assessing  environmental  impacts 
based  upon  a  PCA  of  baseline  data.  Axes  representing  the  first  two  factors, 
which  account  for  most  of  the  variance  in  the  data,  are  plotted  on  a  graph 
and  a  probability  ellipse  is  calculated.  The  probability  ellipse  for  any 
desired  alpha  level  defines  the  boundary  between  a  region  of  data  points 
considered  to  be  statistically  similar  to  the  baseline  data  and  those  con¬ 
sidered  to  represent  a  significant  impact.  The  PCA  scores  for  each  new 
sample  or  set  of  samples  are  calculated  and  evaluated  with  respect  to  the 
ellipse. 

The  means  of  various  data  sets  plotted  in  relation  to  a  probability 
ellipse  defined  by  the  set  of  baseline  data  from  the  Norfolk  Disposal  Site 
monitoring  program  are  presented  in  Figure  3.  The  sets  evaluated  include 
data  from  the  August  and  October  cruises,  as  well  as  the  October  disposal 
site  data  which  has  been  adjusted  by  factors  of  50%,  100%,  200%,  300%,  500%, 
600%  and  700%  of  the  estimated  MDI  values  for  each  variable.  The  mean 
values  of  the  simulated  data  sets  do  not  exceed  the  99%  probability  region 
until  changed  by  factors  of  over  seven  times  the  MDI  levels.  The  method 
appears  to  be  quite  insensitive  to  changes  in  the  data.  However,  this  is 
not  too  surprising  since  PCA  tends  to  emphasize  the  patterns  of  variability 
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Figure  3.  Mean  values  of  standardized  Principal  Component  Analysis  (PCA) 
scores  for  August  and  October  data  and  simulated  data  sets 
containing  various  levels  of  impact  (50/U,  100%,  200%,  300%,  500%, 
600%  and  700%  of  the  MDI  values  for  each  variable)  plotted  on  the 
99%  probability  ellipse  for  the  first  two  factprs  of  a  Principal 
Component  Analysis  (PCA)  of  the  baseline  data  set.  The  relative 
loadings  of  the  original  variables  on  the  axes  are  indicated. 
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within  the  data,  so  any  impact  must  be  relatively  large  before  it  becomes 
statistically  significant  under  this  model. 

The  PCA  factors  may  represent  combinations  of  variables  which  may  be 
differentially  affected  by  any  given  impact  scenario.  For  example,  the 
MDI's  in  Table  II  were  calculated  based  upon  the  assumption  that  oxygen,  pH 
and  plant  pigments  would  decrease  following  an  impact,  while  the  other  vari¬ 
ables  would  increase.  Therefore,  PCA  factors  loading  on  both  "enhanced"  and 
"inhibited"  variables  would  not  change  very  readily.  This  is  the  case  for 
both  factors  (Figure  3).  In  addition,  the  relative  "sensitivity"  of  the 
model  (i.e.  the  ease  with  which  a  change  is  detected)  depends  upon  the  rela¬ 
tive  location  of  the  season  within  the  baseline  ellipse.  In  other  words,  if 
an  August  cruise  had  been  impacted,  it  would  have  been  detected  for  a  far 
smaller  relative  change  from  the  ambient  conditions.  Although  thiS  phenome¬ 
non  makes  it  more  difficult  to  assess  the  MDI  levels  (which  are  on  a  rela¬ 
tive  scale)  with  this  model,  deviations  in  absolute  concentrations  of  the 
variables  from  the  baseline  conditions  would  likely  be  detected. 

A  second  method  involves  the  use  of  discriminant  analysis  in  the  com¬ 
parison  of  "reference"  with  "impacted"  data  in  either  a  baseline  or  seasonal 
model.  A  stepwise  discriminant  analysis  procedure  such  as  that  described  by 
Klecka  (1975),  forms  a  discriminant  function  based  upon  the  variables  which 
best  separate  the  two  groups.  Figure  4  and  Table  III  present  the  results  of 
discriminant  analysis  of  baseline  and  seasonal  models  with  "impacted"  data 
adjusted  by  factors  which  were  0%,  50%,  and  100%  of  the  MDI  levels  for  each 
variable.  Unlike  the  PCA  models,  the  discriminant  analyses  are  very  sensi¬ 
tive.  Unfortunately,  the  discriminant  techniques  are  actually  too  sensi¬ 
tive:  significant  "impacts"  were  indicated  even  when  none  were  introduced 


Figure  4.  Frequency  histograms  of  discriminant  function  scores  for  baseline 
and  "impacted"  data:  a)  0%,  MDI  seasonal  model;  b)  50%  MDI 
seasonal  model;  c)  100%,  MDI  seasonal  model. 
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(see  text  for  details  of  models  and  specific  tests). 
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(0%  MDI)  (Table  III).  This  was  also  not  surprising  since  discriminant  tech¬ 
niques  emphasize  differences  between  groups,  even  if  they  represent  natural 
patterns.  MANOVA  models  can  be  used  to  parallel  the  discriminant  methods. 

A  pre-disposal  versus  post-disposal  effect  is  evaluated  with  respect  to 


patterns  in  all  variables  with  MANOVA  techniques  such  as  those  described  by 
Hull  and  Nie  (1981).  The  results  of  these  analyses  are  also  indicated  in 
Table  III.  Unlike  the  discriminant  analysis,  the  MANOVA  procedure  did  not 
indicate  significant  differences  (at  the  a  =  0.01  level)  between  the  "refer¬ 
ence"  and  the  0*  MDI  data  set  for  the  seasonal  model,  although  the  mean  of 
the  0%  MUI  data  set  was  significantly  different  from  the  grand  mean  in  the 
baseline  model.  The  other  "impacted"  data  sets  (50%  and  100%  MDI)  were 
shown  to  be  very  highly  significantly  different  from  "reference"  sets  for 
both  models. 

There  is  some  question  as  to  whether  it  is  appropriate  to  utilize  tests 
which  statistically  compare  the  mean  of  a  post-impact  data  set  with  that  of 
the  total  baseline  or  composite  seasonal  set.  As  demonstrated  in  Table  III, 
it  is  entirely  conceivable  that  the  mean  discriminant  function  or  canonical 
variate  value  of  data  from  any  given  cruise  could  be  different  from  the 
grand  mean  due  to  natural  variations  alone.  Therefore,  a  second  test  of 
significance  was  sought  for  the  baseline  and  seasonal  models. 

The  discriminant  analysis  procedure  described  by  Klecka  (1975)  include 
a  classification  probability  statistic,  p(x/g),  which  is  defined  as  the 
probability  that  any  given  case  belongs  to  the  group  with  which  it  has  been 
identified.  The  statistic  is  based  upon  a  chi-square  (see  Figure  1): 

(5) 
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where  f  is  the  discriminant  score  of  the  case  being  tested,  T.  is  the 

0 


mean  discriminant  score  for  the  group  in  question  and  D.  is  the  variance- 
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covariance  matrix,  which  reduces  to  the  variance  of  scores  for  group  j 
when  there  is  a  single  discriminant  function.  The  same  statistic,  which  is 
analogous  to  that  used  by  Green  (1979)  (equation  1)  to  calculate  the  prob¬ 
ability  ellipse,  can  be  used  to  define  probability  limits  for  the  discrimi¬ 
nant  scores  of  each  group.  The  scores  of  each  of  the  cases  from  the  impact 
data  set  can  then  be  evaluated  by  equation  5  to  determine  whether  they  would 
fall  within  the  99it  probability  limits  of  the  baseline  or  seasonal  "refer¬ 
ence"  data  sets.  This  procedure  was  used  for  the  data  sets  evaluated  by  the 
discriminant  models,  (Table  III).  The  selection  of  a  specific  cut-off  cri¬ 
terion  in  terms  of  the  percentage  of  cases  outside  the  probability  limits 
being  necessary  to  define  an  "impact"  is  somewhat  subjective.  A  situation 
for  which  over  1%  of  the  cases  fall  outside  of  the  probability  ellipse  may 
be  suspect.  Once  the  mean  discriminant  function  (D.F.)  score  of  the  post¬ 
impact  data  falls  outside  of  the  probability  region  of  the  baseline  condi¬ 
tions,  the  majority  of  the  cases  in  the  data  set  are  likely  to  do  so  as 
well,  so  this  criterion  has  been  tentatively  adopted  in  the  evaluation  of 
MDI  levels.  The  horizontal  bars  above  the  histograms  in  Figure  4  show  the 
relationships  between  the  mean  D.F.  scores  and  probability  limits  for  each 
data  set.  As  indicated  by  Table  III,  few  of  the  056  MDI  cases  fall  outside 
of  the  probability  limits  for  the  "reference"  groups,  while  all  do  for  the 
10056  MDI  data  set,  with  the  5056  MDI  tests  exhibiting  intermediate  results 
(e.g.  4056  and  7556  for  seasonal  and  baseline  models,  respectively). 

The  MANOVA  season-area  interaction  model  essentially  asks  the  question: 
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"Are  there  post-impact  changes  at  the  disposal  site  which  are  not  observed 
elsewhere  and  are  not  accounted  for  by  previously  observed  spatial  differ¬ 
ences?"  The  results  of  the  MANOVA  analysis  of  the  August,  October  and  simu¬ 
late  data  sets  are  also  presented  in  Table  III.  For  the  0%  MDI  data  set, 
the  multivariate  test  showed  no  significant  interaction  effect  (p  =  0.35), 
while  the  univariate  tests  of  each  of  the  variables  provided  by  the  pro¬ 
cedure  indicated  that  no  more  were  significant  than  would  be  expected  by 
change  alone.  The  50%  MDI  data  set  was  shown  to  produce  a  very  highly  sig¬ 
nificant  multivariate  interaction  effect,  while  some  of  the  variables  were 
shown  to  be  significant  on  their  own.  The  interaction  effect  in  the  test  of 
the  100%  MDI  set  was  significant  for  the  multivariate  test  and  all  of  the 
univariate  tests  of  the  individual  variables.  Thus,  with  the  MANOVA  inter¬ 
action  model,  changes  can  be  detected  for  impact  levels  below  the  predicted 
MDI's  for  the  multivariate  approach  and  at  levels  within  the  range  of  the 
expected  MDI's  for  the  univariate  approach. 

As  with  the  case  of  the  MDI's  for  single  samples,  it  is  conceivable,  in 
fact  quite  likely,  that  all  variables  would  not  be  impacted  at  the  same 
time.  Therefore,  it  would  be  useful  to  examine  the  relationship  between  the 
level  of  impact  and  the  number  of  variables  affected  as  they  relate  to  the 
statistical  significance  of  the  test.  A  graph  depicting  an  empirical  re¬ 
lationship  analogous  to  the  one  presented  in  Figure  1  for  single  samples 
would  be  desirable,  but  the  large  number  of  MANOVA  tests  involved  would 
prove  too  time-consuming  and  costly  in  terms  of  computer  usage  to  allow  the 
evaluation  of  the  same  number  of  permutations.  Therefore,  tests  involving 
various  levels  of  impact  for  the  extreme  situations  (i.e.  one  with  a  single 
impacted  variable,  or  with  all  variables  being  impacted)  were  run  under  the 
assumption  that  other  combinations  would  fall  somewhere  between  the  two 


curves.  Figure  5  presents  the  results  of  this  series  of  analyses,  plotting 
the  percent  of  the  MOI  level  versus  the  multivariate  F  produced  by  the 
MANOVA  procedure.  The  curve  for  the  single  impacted  variable  represents  the 
means  of  ten  tests  of  variables  randomly  selected  from  the  data  set  to  be 
adjusted  by  each  given  impact  level.  As  with  the  results  from  the  single 
sample  tests,  the  curves  appear  exponential  and  it  would  appear  that  most 
situations  involving  combinations  of  impacted  variables  would  require  an 
average  level  of  impact  of  30-40<  of  the  MDI's  in  order  to  be  detected 
statistically  by  the  MANOVA  model.  Although  combinations  of  variables  with 
different  levels  of  impact  may  complicate  the  situation  to  a  degree,  it 
would  seem  intuitive  that  they  should  fall  somewhere  between  the  two  empiri¬ 
cal  curves.  Thus,  it  appears  that,  as  for  the  single  sample  case,  the  pre¬ 
dicted  MOI  levels  represent  estimators  of  statistically  significant  impacts 
which  are  conservative,  but  which  are  within  the  same  order  of  magnitude  as 
a  change  which  would  be  detectable  by  a  powerful  multivariate  statistical 


test. 


EVALUATION  OF  STATISTICAL  METHOOS 


One  potential  problem  with  the  types  of  multivariate  statistical  models 
which  have  been  described  is  the  effect  of  heterogeneity  of  variances  and 
non-normality  on  the  test  results.  In  order  to  evaluate  this  effect,  a 
series  of  empirical  tests  were  made  utilizing  data  sets  simulated  to  match 
the  distribution  of  a  highly  skewed  sediment  grain  size  data  set.  The 
method  of  simulation  is  designed  to  produce  data  sets  with  desired  distri¬ 
butions  and  various  means  (Alden,  1984).  In  the  evaluation  test  series, 
discriminant  analysis  and  MANOVA  models  were  used  to  compare  four  groups  of 
twenty  cases  which  were  simulated  to  have  identical  means  (i.e.  a  true  null 


Figure  5.  Multivariate  F  values  versus  the  percent  of  the  MDI's  for  a 
series  of  empirical  tests.  The  curve  for  single  impacted 
variables  is  based  upon  10  tests  of  variables  randomly  selected 
from  the  data  set  to  be  adjusted  for  each  impact  level  (means 
indicated  by  closed  circles,  vertical  bars  represent  2  standard 
errors).  The  curve  labeled  16  is  for  all  variables  being 
impacted  at  once. 
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hypothesis).  Tests  were  run  for  each  type  of  analysis  on  fifty  independent 
simulations.  The  nominal  probability  (a)  values  associated  with  the  multi¬ 
variate  test  statistics  were  then  compared  with  the  observed  probability 
levels.  The  latter  values  were  calculated  as  the  cumulative  proportion  of 
the  observations  found  for  each  nominal  level. 

The  results  of  the  comparisons  of  the  observed  versus  nominal  a 
levels  for  the  multivariate  tests  are  presented  in  Figure  6.  An  interesting 
contrast  can  be  seen  between  the  results  of  the  discriminant  analysis  (Fig¬ 
ure  6  a,b)  and  those  from  the  MANOVA  tests  (Figure  6  c).  The  results  from 
the  discriminant  analysis,  Wilk's  Lambda  comparison  of  all  groups  and  the 
Mahalanobis  comparisons  of  individual  pairs  of  groups  (Klecka,  1975), 
clearly  indicates  a  nonlinear  relationship  between  the  observed  and  nominal 
levels.  Observed  o  values  are  always  higher  than  nominal  levels.  Virtual¬ 
ly  all  nominal  o  levels  are  below  0.20.  In  other  words,  the  discriminant 
analysis  models  are  overly  sensitive,  producing  test  statistics  with 
"significant"  nominal  a  levels,  even  when  no  true  differences  exist 
between  the  means. 

On  the  other  hand,  the  observed  and  nominal  alpha  levels  for  the  MANOVA 
model  were  nearly  equal  throughout  the  entire  range,  producing  a  linear 
relationship  of  good  fit  (R^  =  0.99).  Therefore,  the  MANOVA  models  appear 
to  produce  expected  results  despite  a  great  deviation  from  multivariate 
normality.  However,  if  the  linear  relationship  had  not  crossed  the  origin, 
the  regression  equation  could  have  been  used  to  predict  the  appropriate  a 
level  for  any  given  test  statistic  (e.g.  an  "F"  value  producing  a  nominal  a 
level  of  0.05  may  in  fact  be  at  an  actual  value  of  0.07,  etc.). 

Between  the  results  of  the  MDI  evaluations  and  the  simulation  evalu¬ 
ations  of  the  statistical,  it  is  clear  that  MANOVA  represents  a  "better" 
statistical  model  than  does  discriminant  analysis  in  the  comparison  of  en- 


Figure  6.  Observed  versus  nominal  a  levels  for  multivariate  tests  of  data 
sets  (50  runs  of  4  groups  each)  simulated  to  fulfill  a  time  null 
hypothesis:  a)  Discriminant  Wilk's  Lambda  Test;  b)  Mahal anobis 
comparisons  of  individual  pairs  of  groups;  and  c)  MANOVA 
Multivariate  F  Tests. 
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vironmental  data  sets.  Discriminant  analysis,  while  being  overly  sensitive 
as  a  test  statistic,  does  have  certain  advantages:  it  can  indicate  which 
variables  are  most  responsible  for  differences  between  groups;  and  the  re¬ 
sults  can  be  readily  displayed  for  presentation  purposes  (e.g.  group  distri¬ 
butions  plotted  on  a  frequency  histogram  of  a  single  discriminant  function, 
or  probability  ellipses  of  groups  plotted  on  a  graph  of  the  first  two  dis¬ 
criminant  functions).  Therefore,  results  of  a  discriminant  analysis  may  be 
used  for  data  presentation  purposes,  but  the  “signf icance"  of  the  patterns 
should  be  confirmed  with  a  MANOVA  test. 


CONCLUSIONS 

A  series  of  statistical  approaches  have  been  developed  for  the  evalu¬ 
ation  of  baseline  and  trend  assessment  data.  For  baseline  data  sets,  models 
have  been  developed  which  allow  the  estimation  of  "Minimum  Detection  Limits" 
for  scenarios  involving  both  single  samples  and  entire  data  sets.  The 
levels  of  MDI  estimated  for  the  data  set  selected  for  demonstration  purposes 
indicated  that  the  monitoring  program  was  effective  in  providing  an  "early 
warning  system"  for  the  statistical  detection  of  impacts  before  they  become 
excessively  detrimental  to  the  environment.  In  fact,  when  more  than  one 
variable  was  affected  at  the  same  time,  the  amount  of  relative  change  in  any 
given  level  was  less  (e.g.  only  30-50%  of  MDI  values  for  most  multiple 
impacts).  However,  if  the  MDI's  had  represented  conditions  which  would  be 
considered  to  be  damaging  ecologically,  models  for  the  estimation  of  ap¬ 
propriate  sample  size  for  the  detection  of  any  desired  level  of  change  (e.g. 
see  methods  described  by  jokal  and  Rohlf,  1969  and  Green,  1979)  could  be 
used  and  computer  simulations  of  data  sets  with  this  level  of  replication 
run  with  the  MDI  prediction  methods  to  calculate  new  MDI's.  Multivariate 
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statistical  models  could  be  run  to  verify  the  new  MDI  levels. 

A  cost-benefit  analysis  could  then  be  run  to  evaluate  the  relative 
amount  of  effort  (and  resources)  required  to  potentially  detect  any  given 
level  of  change  (i.e.  the  "cost"  of  the  various  monitoring  regimes  could  be 
plotted  against  the  predicted  MDI  levels  for  evaluation  purposes).  In  such 
an  analysis,  a  point  of  "diminishing  returns"  could,  perhaps,  be  observed 
and  the  optimum  regime  selected.  At  the  very  least,  the  investigator  would 
be  given  an  indication  of  the  potential  sensitivity  of  the  regime  which  has 
been  selected. 

The  effectiveness  of  several  multivariate  techniques  for  the  detection 
of  impacts  in  data  from  trend  asssessment  studies  was  evaluated  empirically. 
A  PCA  model  proved  to  be  not  very  sensitive  to  the  detection  of  significant 
changes,  while  discriminant  analysis  models  were  overly  sensitive,  suggest¬ 
ing  significant  differences  when  none  were  present.  The  MANOVA  techniques 
proved  to  be  the  most  effective,  particularly  when  the  season-area  inter¬ 
action  models  were  used.  Simulation  evaluations  confirmed  the  effectiveness 
and  robustness  of  the  MANOVA  models.  These  simulation  evaluations  can  also 
be  used  to  define  new  critical  test  values  for  statistical  models  not  ex¬ 
hibiting  the  nominal  a  values,  thus  freeing  the  statistical  models  from 
the  assumptions  of  multivariate  normality  and  homogeneity. 
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